We will use Eurostat indicator data for “Gender pay gap in unadjusted form” to explore the geographical and time trends for the gender pay gap in the EU and compare Portugal with some European Union (EU) countries
The objective is to look at the geographical and time trends in the data. We will answer the following questions:
Unit of Measure: % of average gross hourly earnings of men.
The indicator measures the difference between average gross hourly earnings of male paid employees and of female paid employees as a percentage of average gross hourly earnings of male paid employees. The indicator has been defined as unadjusted, because it gives an overall picture of gender inequalities in terms of pay and measures a concept which is broader than the concept of equal pay for equal work. All employees working in firms with ten or more employees, without restrictions for age and hours worked, are included.
Taken from (https://ec.europa.eu/eurostat/databrowser/view/sdg_05_20/)
The Eurostat gender pay gap data is from the “Structure of Earnings Survey (SES)” and is based on data reported by the countries.
The data is Copyrighted by Eurostat Copyright/Licence Policy is applicable.
Please see (https://ec.europa.eu/eurostat/cache/metadata/en/sdg_05_20_esmsip2.htm) for further information about the data.
Selecting all the available pay gap data (indicator code sdg_05_20) from Eurostat.
# Get all EU data in one go and keep the country code (`geo_code`)
pgapEU <- get_eurostat(id="sdg_05_20", time_format = "num") %>%
label_eurostat(., code = "geo")
# We will work with the `data.table` package.
setDT(pgapEU)
# Minimum and maximum available year
minYear <- min(pgapEU$time, na.rm = TRUE)
maxYear <- max(pgapEU$time, na.rm = TRUE)
#_# To do
# Best to get the map data first so we can merge it directly to the indicator data.
# mapEU <- get_eurostat_geospatial(nuts_level = 0)
# setDT(mapEU)
# Update the `geo` variable to make it print and plot friendly.
pgap <- pgapEU[, geo_orig := geo] %>%
.[, geo := fifelse(geo_code == "DE", "Germany", geo)] %>%
.[, geo := fifelse(grepl("^EU|^EA", geo_code), gsub("_", " ", geo_code), geo)] %>%
# Creating a variable geo_label to label lines just once.
.[, geo_label_right := ifelse(time == max(time), geo_code, ""), .(geo_code)] %>%
.[, geo_label_left := ifelse(time == min(time), geo_code, ""), .(geo_code)] %>%
.[, c("nace_r2"):=NULL] %>%
# Adding a factor time variable (with levels in reverse)
.[, timeF := factor(time, levels = c(maxYear:minYear))]
#_# To do
# mutate (cat = cut_to_classes (values, n = 4, decimals = 1))We will highlight some countries to compare Portugal with.
Some data summaries to understand the data that we have.
## [1] 2002 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018
# Information by country
pgap[, .(countryData = sprintf("%2s %15s: %4.0f-%4.0f (%2.0f)", geo_code, geo, min(time), max(time), .N)),
.(geo_code, geo)] %>%
.[, c(countryData)] %>%
unique(.) %>% sort(.)## [1] "AT Austria: 2006-2018 (13)"
## [2] "BE Belgium: 2006-2018 (13)"
## [3] "BG Bulgaria: 2002-2018 (14)"
## [4] "CH Switzerland: 2006-2017 (11)"
## [5] "CY Cyprus: 2002-2018 (14)"
## [6] "CZ Czechia: 2002-2018 (14)"
## [7] "DE Germany: 2006-2018 (13)"
## [8] "DK Denmark: 2006-2018 (13)"
## [9] "EA19 EA19: 2010-2018 ( 9)"
## [10] "EE Estonia: 2006-2018 (13)"
## [11] "EL Greece: 2002-2018 ( 7)"
## [12] "ES Spain: 2002-2018 (14)"
## [13] "EU27_2007 EU27 2007: 2006-2006 ( 1)"
## [14] "EU27_2020 EU27 2020: 2010-2018 ( 9)"
## [15] "EU28 EU28: 2010-2018 ( 9)"
## [16] "FI Finland: 2006-2018 (13)"
## [17] "FR France: 2006-2018 (13)"
## [18] "HR Croatia: 2010-2018 ( 6)"
## [19] "HU Hungary: 2002-2018 (14)"
## [20] "IE Ireland: 2002-2017 (13)"
## [21] "IS Iceland: 2007-2018 (12)"
## [22] "IT Italy: 2006-2018 (13)"
## [23] "LT Lithuania: 2002-2018 (14)"
## [24] "LU Luxembourg: 2006-2018 (13)"
## [25] "LV Latvia: 2006-2018 (13)"
## [26] "ME Montenegro: 2014-2014 ( 1)"
## [27] "MK North Macedonia: 2014-2014 ( 1)"
## [28] "MT Malta: 2006-2018 (13)"
## [29] "NL Netherlands: 2002-2018 (14)"
## [30] "NO Norway: 2006-2018 (13)"
## [31] "PL Poland: 2002-2018 (14)"
## [32] "PT Portugal: 2006-2018 (13)"
## [33] "RO Romania: 2002-2018 (14)"
## [34] "RS Serbia: 2014-2018 ( 2)"
## [35] "SE Sweden: 2006-2018 (13)"
## [36] "SI Slovenia: 2002-2018 (14)"
## [37] "SK Slovakia: 2002-2018 (14)"
## [38] "TR Turkey: 2006-2014 ( 2)"
## [39] "UK United Kingdom: 2002-2018 (14)"
pg01 <- pgap[geo_code %chin% ct] %>%
ggplot(aes(x = time, y= values, color = geo, label = geo)) +
geom_line (alpha = .8, size = 1) +
geom_point() +
scale_y_continuous(breaks = seq(0, 30, 5), limits = c(0,30)) +
scale_x_continuous(breaks = seq(2002, 2018, 2), limits = c(2002, 2019)) +
theme(legend.position = "none") +
geom_text_repel(aes(label=geo_label_right),
direction = "y",
nudge_x = .85,
segement.alpha = 0.2,
segment.color = "grey80") +
labs(title = "Gender Pay Gap Over Time",
x= "Year",
y= "% Average Difference",
caption = "Unadjusted % difference between average gross hourly earnings of male paid employees and of females.")
pg01pg01 +
geom_text_repel(aes(label=geo_code),
direction = "y",
nudge_x = .75,
segement.alpha = 0.7,
segment.colour = "grey80") +
<<<<<<< HEAD
transition_reveal(time) +
geom_point()Portugal has no available data until 2006 and the EU only has available data from 2010 onwards.
pg02 <- pgap[geo_code %chin% PTEU] %>%
ggplot(aes(x = time, y= values, color = geo, label = geo)) +
geom_line(data = pgap, aes(x = time, y= values, group = geo), colour ="grey70", alpha = .5) +
geom_line (alpha = .8, size = 1) +
scale_y_continuous(breaks = seq(0, 30, 5), limits = c(0,30)) +
scale_x_continuous(breaks = seq(2002, 2018, 2), limits = c(2002, 2019)) +
theme(legend.position = "none") +
geom_text_repel(aes(label=geo_label_right),
direction = "y",
nudge_x = .45,
segement.alpha = 0.7) +
labs(title = "Gender Pay Gap, 2003-2018",
x= "Year",
y= "% Average Difference",
caption = "Unadjusted % difference between average gross hourly earnings of male paid employees and of females.")
pg02pgap[time %in% c(2010, 2014, 2018) & geo_code %chin% ct] %>%
ggplot (aes(x = reorder(geo_code, values), y = values, fill = timeF)) +
geom_bar(stat = "identity", alpha=.8, width=.8, position = "dodge") +
# facet_wrap(~time, scales = "free_x") +
# gghighlight(geo_code == "PT") +
labs(title = "Gender Pay Gap Over Time",
x = "",
y = "% Average Difference",
fill = "",
caption = "Unadjusted % difference between average gross hourly earnings of male paid employees and of females.") +
<<<<<<< HEAD
coord_flip()TO DO